library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.6
## ✔ tidyr 0.8.1 ✔ stringr 1.2.0
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(gapminder)
Switch focus to exploring aesthetic mappings, instead of geoms.
gdpPercap vs lifeExp with a categorical variable (continent) as shape.gvsl <- ggplot(gapminder, aes(gdpPercap, lifeExp)) +
scale_x_log10()
gvsl+ geom_point(aes(shape=continent), alpha =0.2)
pch? #PCH is a base r way of inidicating shapegvsl + geom_point(shape = 7)
gvsl + geom_point(pch = 7)
gvsl + geom_point(shape = "%")
List of shapes can be found at the bottom of the scale_shape documentation.
Make a scatterplot. Then:
gvsl + geom_point(aes(color = continent))
gvsl + geom_point(aes(color =pop))
colour and color.trans="log10" for log scale.gvsl + geom_point(aes(color =pop)) + scale_color_continuous(trans="log10")
gvsl + geom_point(aes(color = lifeExp > 60))
Make a line plot of gdpPercap over time for all countries. Colour by lifeExp > 60 (remember that lifeExp looks bimodal?)
Try adding colour to a histogram. How is this different?
ggplot(gapminder, aes(lifeExp)) +
geom_histogram(aes(color=continent))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(gapminder, aes(lifeExp)) +
geom_histogram(aes(fill=continent))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Above is an example of overplotting - too much info on one plot. not apparent if all bins go to the bottom
This fixes the problem of overplotting above Make histograms of gdpPercap for each continent. Try the scales and ncol arguments.
ggplot(gapminder, aes(lifeExp)) +
facet_wrap( ~ continent) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(gapminder, aes(lifeExp)) +
facet_wrap( ~ continent, scale = 'free_x') +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#free_x to allow individual scales for each subplot
Remove Oceania. Add another variable: lifeExp > 60.
ggplot(gapminder, aes(gdpPercap)) +
facet_grid(continent ~ lifeExp > 60) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#the true/false is life > 60 or not
size aesthetic to a scatterplot. What about cex?gvsl + geom_point(aes(size = pop), alpha = 0.2)
gvsl + geom_point(aes(size = pop), alpha = 0.2) +
scale_size_area()
scale_radius() and scale_size_area(). What’s better?gvsl + geom_point(aes(size = pop, fill = continent), shape = 21, color = "black", alpha = 0.3)
#notice it is not in the aes mapping
shape=21 to distinguish between fill (interior) and colour (exterior).Let’s try plotting much of the data.
gvsl + geom_point(aes(size=pop, color=continent)) +
scale_size_area() +
facet_wrap(~ year)
x and y aesthetics)Let’s see how Rwanda’s life expectancy and GDP per capita have evolved over time, using a path plot.
geom_line(). Try geom_point().gapminder %>%
filter(country == "Rwanda") %>%
ggplot(aes(gdpPercap, lifeExp)) +
scale_x_log10() +
geom_point()
gapminder %>%
filter(country == "Rwanda") %>%
arrange(year) %>%
ggplot(aes(gdpPercap, lifeExp)) +
scale_x_log10() +
geom_point() +
geom_path() +
geom_path(arrow=arrow())
gapminder %>%
filter(country == "Rwanda") %>%
ggplot(aes(gdpPercap, lifeExp)) +
scale_x_log10() +
geom_point() +
geom_line()
Add arrow=arrow() option.
Add geom_text, with year label.
Try cyl (number of cylinders) ~ am (transmission) in the mtcars data frame.
ggplot(mtcars, aes(cyl, am)) +
geom_point()
ggplot(mtcars, aes(cyl, am)) +
geom_jitter()
geom_count().ggplot(mtcars, aes(cyl, am)) +
geom_count()
geom_bin2d(). Compare with geom_tile() with fill aes.ggplot(mtcars, aes(factor(cyl), factor(am))) +
geom_bin2d()
ggplot(mtcars, aes(factor(cyl), factor(am))) +
geom_tile()
Try a scatterplot with:
geom_hex()geom_density2d()geom_smooth()library(hexbin)
gvsl + geom_hex()
gvsl + geom_smooth(alpha =0.2)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
gvsl + geom_point(alpha = 0.2) + geom_smooth(method = 'lm')
How many countries are in each continent? Use the year 2007.
d.gapminder %>%
filter(year == 2007) %>%
ggplot(aes(x = continent)) +
geom_bar()
gapminder %>%
filter(year == 2007) %>%
ggplot(aes(x = continent)) +
geom_bar()
geom_col()
## geom_col: width = NULL, na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_stack
ggplot2 doesn’t make it obvious how to change to proportion. Try adding a y aesthetic: y=..count../sum(..count..).Uses of bar plots: Get a sense of relative quantities of categories, or see the probability mass function of a categorical random variable.
coord_polar() to a scatterplot.gvsl + geom_point() + coord_polar()
If you’d like some practice, give these exercises a try
Exercise 1: Make a plot of year (x) vs lifeExp (y), with points coloured by continent. Then, to that same plot, fit a straight regression line to each continent, without the error bars. If you can, try piping the data frame into the ggplot function.
ggplot(gapminder, aes(year, lifeExp)) +
geom_point(aes(color=continent)) +
geom_smooth(aes(line=continent))
## Warning: Ignoring unknown aesthetics: line
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(gapminder, aes(year, lifeExp, group = continent)) +
geom_point(aes(color=continent)) +
geom_smooth(method = 'lm', se = TRUE)
#wavy grey part is a confidence interval, se=TRUE
ggplot(gapminder, aes(year, lifeExp, color = continent)) +
geom_point(aes(color=continent)) +
geom_smooth(method = 'lm', se = FALSE)
Exercise 2: Repeat Exercise 1, but switch the regression line and geom_point layers. How is this plot different from that of Exercise 1?
ggplot(gapminder, aes(year, lifeExp, color = continent)) +
geom_smooth(method = 'lm', se = FALSE) +
geom_point(aes(color=continent))
Exercise 3: Omit the geom_point layer from either of the above two plots (it doesn’t matter which). Does the line still show up, even though the data aren’t shown? Why or why not?
ggplot(gapminder, aes(year, lifeExp, color = continent)) +
geom_smooth(method = 'lm', se = FALSE)
Exercise 4: Make a plot of year (x) vs lifeExp (y), facetted by continent. Then, fit a smoother through the data for each continent, without the error bars. Choose a span that you feel is appropriate.
ggplot(gapminder, aes(year, lifeExp, color = continent)) +
facet_wrap(~continent) +
geom_smooth(method = 'lm', se = FALSE) +
geom_point()
Exercise 5: Plot the population over time (year) using lines, so that each country has its own line. Colour by gdpPercap. Add alpha transparency to your liking.
Exercise 6: Add points to the plot in Exercise 5.